AITopics | memory utilization

Collaborating Authors

memory utilization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training

Afroz, Sabiha, Khan, Redwan Ibne Seraj, Albahar, Hadeel, Han, Jingoo, Butt, Ali R.

arXiv.org Artificial IntelligenceNov-19-2025

Training large language models (LLMs) in the cloud faces growing memory bottlenecks due to the limited capacity and high cost of GPUs. While GPU memory offloading to CPU and NVMe has made large-scale training more feasible, existing approaches suffer from high tensor migration latency and suboptimal device memory utilization, ultimately increasing training time and cloud costs. To address these challenges, we present 10Cache, a resource-aware tensor caching and migration system that accelerates LLM training by intelligently coordinating memory usage across GPU, CPU, and NVMe tiers. 10Cache profiles tensor execution order to construct prefetch policies, allocates memory buffers in pinned memory based on tensor size distributions, and reuses memory buffers to minimize allocation overhead. Designed for cloud-scale deployments, 10Cache improves memory efficiency and reduces reliance on high-end GPUs. Across diverse LLM workloads, it achieves up to 2x speedup in training time, improves GPU cache hit rate by up to 86.6x, and increases CPU/GPU memory utilization by up to 2.15x and 1.33x, respectively, compared to state-of-the-art offloading methods. These results demonstrate that 10Cache is a practical and scalable solution for optimizing LLM training throughput and resource efficiency in cloud environments.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.14124

Country:

Europe (0.28)
North America > United States (0.16)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Prepared for the Unknown: Adapting AIOps Capacity Forecasting Models to Data Changes

Poenaru-Olaru, Lorena, Hof, Wouter van 't, Stando, Adrian, Trawinski, Arkadiusz P., Kapel, Eileen, Rellermeyer, Jan S., Cruz, Luis, van Deursen, Arie

arXiv.org Artificial IntelligenceOct-14-2025

Abstract--Capacity management is critical for software organizations to allocate resources effectively and meet operational demands. An important step in capacity management is predicting future resource needs often relies on data-driven analytics and machine learning (ML) forecasting models, which require frequent retraining to stay relevant as data evolves. Continuously retraining the forecasting models can be expensive and difficult to scale, posing a challenge for engineering teams tasked with balancing accuracy and efficiency. Retraining only when the data changes appears to be a more computationally efficient alternative, but its impact on accuracy requires further investigation. In this work, we investigate the effects of retraining capacity forecasting models for time series based on detected changes in the data compared to periodic retraining. Our results show that drift-based retraining achieves comparable forecasting accuracy to periodic retraining in most cases, making it a cost-effective strategy. However, in cases where data is changing rapidly, periodic retraining is still preferred to maximize the forecasting accuracy. These findings offer actionable insights for software teams to enhance forecasting systems, reducing retraining overhead while maintaining robust performance. The term capacity management refers to ensuring that an IT service has sufficient infrastructure and resources to meet the current or future demand. Although capacity management is crucial to ensure efficient and effective service delivery, this process used to be carried on manually by continuously collecting and analyzing data [32]. Manual techniques to predict the capacity requirements become difficult to scale as the capacity management data sources increase, and it is significantly time-consuming for the engineers in charge. To automate the capacity management for machine utilization, like CPU and memory, companies have started employing forecasting AIOps models, which predict the resource demand in a timely fashion. This is particularly relevant for our industry partner, ING (International Netherlands Group) Bank, where operational engineers must monitor numerous time series to ensure sufficient resources are allocated for its large-scale online operations, supported by thousands of machines with varying resource demands.

data mining, forecasting model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.1032

Country: Europe > Netherlands (0.48)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (0.88)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Real-Time Framework for Intermediate Map Construction and Kinematically Feasible Off-Road Planning Without OSM

Jerome, Otobong, Kulathunga, Geesara Prathap, Dmitry, Devitt, Murawjow, Eugene, Klimchik, Alexandr

arXiv.org Artificial IntelligenceOct-7-2025

Off-road environments present unique challenges for autonomous navigation due to their complex and unstructured nature. Traditional global path-planning methods, which typically aim to minimize path length and travel time, perform poorly on large-scale maps and fail to account for critical factors such as real-time performance, kinematic feasibility, and memory efficiency. This paper introduces a novel global path-planning method specifically designed for off-road environments, addressing these essential factors. The method begins by constructing an intermediate map within the pixel coordinate system, incorporating geographical features like off-road trails, waterways, restricted and passable areas, and trees. The planning problem is then divided into three sub-problems: graph-based path planning, kinematic feasibility checking, and path smoothing. This approach effectively meets real-time performance requirements while ensuring kinematic feasibility and efficient memory use. The method was tested in various off-road environments with large-scale maps up to several square kilometers in size, successfully identifying feasible paths in an average of 1.5 seconds and utilizing approximately 1.5GB of memory under extreme conditions. The proposed framework is versatile and applicable to a wide range of off-road autonomous navigation tasks, including search and rescue missions and agricultural operations.

artificial intelligence, path planning, real time system, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1142/S2301385027500245

2510.03948

Country:

Asia (0.68)
Europe > United Kingdom (0.28)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Add feedback

A Preliminaries on Transformers

Neural Information Processing SystemsAug-17-2025, 04:55:46 GMT

Perplexity is a widely used metric for evaluating the performance of autoregressive language models. This metric encapsulates how well the model can predict a word.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling

Patwari, Rajeev, Sirasao, Ashish, Das, Devleena

arXiv.org Artificial IntelligenceAug-5-2025

Large language models (LLMs) have been increasingly deployed as local agents on personal devices with CPUs, NPUs and integrated GPUs. However, forecasting inference performance on devices with such heterogeneity remains challenging due to the dynamic compute and memory demands. Existing approaches rely on GPU benchmarking or machine learning-based latency predictors, which are often hardware-specific and lack generalizability. To this end, we introduce LIFE, a lightweight and modular analytical framework that is comprised of modular analytical model of operators, configurable to characterize LLM inference workloads in a hardware and dataset-agnostic manner. LIFE characterizes the influence of software and model optimizations, such as quantization, KV cache compression, LoRA adapters, chunked prefill, different attentions, and operator fusion, on performance metrics such as time-to-first-token (TTFT), time-per-output-token (TPOT) and tokens-per-second (TPS). LIFE enables performance forecasting using only hardware specifications, such as TOPS and memory bandwidth, without requiring extensive dataset benchmarking. We validate LIFE's forecasting with inference on AMD Ryzen CPUs, NPUs, iGPUs and NVIDIA V100 GPUs, with Llama2-7B variants, demonstrating the utility of LIFE in forecasting LLM performance through lens of system efficiency to enable efficient LLM deployment across different hardware platforms.

efficiency, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.00904

Genre: Research Report (0.51)

Industry: Information Technology > Hardware (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems

Douwes, Constance, Serizel, Romain

arXiv.org Artificial IntelligenceSep-8-2024

The massive use of machine learning models, particularly neural networks, has raised serious concerns about their environmental impact. Indeed, over the last few years we have seen an explosion in the computing costs associated with training and deploying these systems. It is, therefore, crucial to understand their energy requirements in order to better integrate them into the evaluation of models, which has so far focused mainly on performance. In this paper, we study several neural network architectures that are key components of sound event detection systems, using an audio tagging task as an example. We measure the energy consumption for training and testing small to large architectures and establish complex relationships between the energy consumption, the number of floating-point operations, the number of parameters, and the GPU/memory utilization.

architecture, energy consumption, utilization, (12 more...)

arXiv.org Artificial Intelligence

2409.0508

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (1.00)
Law > Environmental Law (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data

Mudgal, Priyanka, Wouhaybi, Rita H.

arXiv.org Artificial IntelligenceJun-7-2024

The growing reliance on computer systems, particularly personal computers (PCs), necessitates heightened reliability to uphold user satisfaction. This research paper presents an in-depth analysis of extensive system telemetry data, proposing an ensemble methodology for detecting system failures. Our approach entails scrutinizing various parameters of system metrics, encompassing CPU utilization, memory utilization, disk activity, CPU temperature, and pertinent system metadata such as system age, usage patterns, core count, and processor type. The proposed ensemble technique integrates a diverse set of algorithms, including Long Short-Term Memory (LSTM) networks, isolation forests, one-class support vector machines (OCSVM), and local outlier factors (LOF), to effectively discern system failures. Specifically, the LSTM network with other machine learning techniques is trained on Intel Computing Improvement Program (ICIP) telemetry software data to distinguish between normal and failed system patterns. Experimental evaluations demonstrate the remarkable efficacy of our models, achieving a notable detection rate in identifying system failures. Our research contributes to advancing the field of system reliability and offers practical insights for enhancing user experience in computing environments.

memory utilization, system error, utilization, (14 more...)

arXiv.org Artificial Intelligence

2407.00048

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment

Mih, Atah Nuh, Rahimi, Alireza, Kawnine, Asfia, Palma, Francis, Wachowicz, Monica, Dubay, Rickey, Cao, Hung

arXiv.org Artificial IntelligenceMar-14-2024

This paper proposes an optimization of an existing Deep Neural Network (DNN) that improves its hardware utilization and facilitates on-device training for resource-constrained edge environments. We implement efficient parameter reduction strategies on Xception that shrink the model size without sacrificing accuracy, thus decreasing memory utilization during training. We evaluate our model in two experiments: Caltech-101 image classification and PCB defect detection and compare its performance against the original Xception and lightweight models, EfficientNetV2B1 and MobileNetV2. The results of the Caltech-101 image classification show that our model has a better test accuracy (76.21%) than Xception (75.89%), uses less memory on average (847.9MB) than Xception (874.6MB), and has faster training and inference times. The lightweight models overfit with EfficientNetV2B1 having a 30.52% test accuracy and MobileNetV2 having a 58.11% test accuracy. Both lightweight models have better memory usage than our model and Xception. On the PCB defect detection, our model has the best test accuracy (90.30%), compared to Xception (88.10%), EfficientNetV2B1 (55.25%), and MobileNetV2 (50.50%). MobileNetV2 has the least average memory usage (849.4MB), followed by our model (865.8MB), then EfficientNetV2B1 (874.8MB), and Xception has the highest (893.6MB). We further experiment with pre-trained weights and observe that memory usage decreases thereby showing the benefits of transfer learning. A Pareto analysis of the models' performance shows that our optimized model architecture satisfies accuracy and low memory utilization objectives.

accuracy, architecture, xception, (16 more...)

arXiv.org Artificial Intelligence

2403.10569

Country:

North America > Canada > New Brunswick > Fredericton (0.04)
Oceania > Australia (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling

Sugiharto, Herman, Aradea, null, Mubarok, Husni

arXiv.org Artificial IntelligenceAug-7-2023

The Attention module finds common usage in language modeling, presenting distinct challenges within the broader scope of Natural Language Processing. Multi-Head Attention (MHA) employs an absolute positional encoding, which imposes limitations on token length and entails substantial memory consumption during the processing of embedded inputs. The current remedy proposed by researchers involves the utilization of relative positional encoding, similar to the approach adopted in Transformer-XL or Relative Multi-Head Attention (RMHA), albeit the employed architecture consumes considerable memory resources. To address these challenges, this study endeavors to refine MHA, leveraging relative positional encoding in conjunction with the Depth-Wise Convolutional Layer architecture, which promises heightened accuracy coupled with minimized memory usage. The proposed RCMHA framework entails the modification of two integral components: firstly, the application of the Depth-Wise Convolutional Layer to the input embedding, encompassing Query, Key, and Value parameters; secondly, the incorporation of Relative Positional Encoding into the attention scoring phase, harmoniously integrated with Scaled Dot-Product Attention. Empirical experiments underscore the advantages of RCMHA, wherein it exhibits superior accuracy, boasting a score of 0.572 in comparison to alternative attention modules such as MHA, Multi-DConv-Head Attention (MDHA), and RMHA. Concerning memory utilization, RMHA emerges as the most frugal, demonstrating an average consumption of 2.98 GB, surpassing RMHA which necessitates 3.5 GB.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2308.03429

Country:

Asia > Indonesia (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

memory utilization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

9949e6906be6448230cdba9a4cb2d564-Supplemental-Conference.pdf

10Cache: Heterogeneous Resource-Aware Tensor Caching and Migration for LLM Training

Prepared for the Unknown: Adapting AIOps Capacity Forecasting Models to Data Changes

A Real-Time Framework for Intermediate Map Construction and Kinematically Feasible Off-Road Planning Without OSM

A Preliminaries on Transformers

Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling

From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems

Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data

Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment

RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling